Word Class Discovery For Postprocessing Chinese Handwriting Recognition

نویسنده

  • Chao-Huang Chang
چکیده

This article presents a novel Chinese class n-gram model for contextual postprocessing of haudwriting recognition results. The word classes in the model are automatically discovered by a corpus-based simulated anuealing procedure. Three other language models, least-word, word-frequency, and the powerflfl interword character bigram model, have been constructed for comparison. Extensive experiments on large text corpora show that the discovered class bigram model outperforms the other three competing models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating diverse information sources in handwriting recognition postprocessing

This paper describes the proposed implementation of a new model for the linguistic postprocessing component of the Human Language Technology (HLT) project. The model was designed for handwriting recognition applications but can be used for other text recognition problems and speech recognition. We demonstrate here that the current implementation (the POS model) fails to incorporate new sources ...

متن کامل

Recognition of Cursive Roman Handwriting - Past, Present and Future

This paper reviews the state of the art in off-line Roman cursive handwriting recognition. The input provided to an off-line handwriting recognition system is an image of a digit, a word, or more generally some text, and the system produces, as output, an ASCII transcription of the input. This task involves a number of processing steps, some of which are quite difficult. Typically, preprocessin...

متن کامل

A New Method for Rotation Free Online Unconstrained Handwritten Chinese Word Recognition: A Holistic Approach

Most online handwriting word recognition (HWR) approaches proceed by segmenting words into isolate characters which are recognized separately. Inspired by results in cognitive psychology, holistic word recognition approaches provides another effective way to deal the problem of HWR. In this paper, we propose a new method for rotation free online unconstrained Chinese word recognition through a ...

متن کامل

The Postprocessing of Optical Character Recognition Based on Statistical Noisy Channel and Language Model

The techniques of image processing have been used in optical character recognition (OCR) for a long time. The recognition method evolved from early "pattern recognition" to "feature extraction" recently. The recognition rate is raised from 70% to 90%. But the character by character recognition technique has its limitation. Using language models to assist the OCR system in improving recognition ...

متن کامل

Evaluation of weighted Fisher criteria for large category dimensionality reduction in application to Chinese handwriting recognition

To improve the class separability of Fisher linear discriminant analysis (FDA) for large category problems, we investigate the weighted Fisher criterion (WFC) by integrating weighting functions for dimensionality reduction. The objective of WFC is to maximize the sum of weighted distances of all class pairs. By setting larger weights for the most confusable classes, WFC can improve the class se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994